data controller
Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data
Nolte, Henrik, Finck, Michèle, Meding, Kristof
Does GPT know you? The answer depends on your level of public recognition; however, if your information was available on a website, the answer is probably yes. All Large Language Models (LLMs) memorize training data to some extent. If an LLM training corpus includes personal data, it also memorizes personal data. Developing an LLM typically involves processing personal data, which falls directly within the scope of data protection laws. If a person is identified or identifiable, the implications are far-reaching: the AI system is subject to EU General Data Protection Regulation requirements even after the training phase is concluded. To back our arguments: (1.) We reiterate that LLMs output training data at inference time, be it verbatim or in generalized form. (2.) We show that some LLMs can thus be considered personal data on their own. This triggers a cascade of data protection implications such as data subject rights, including rights to access, rectification, or erasure. These rights extend to the information embedded with-in the AI model. (3.) This paper argues that machine learning researchers must acknowledge the legal implications of LLMs as personal data throughout the full ML development lifecycle, from data collection and curation to model provision on, e.g., GitHub or Hugging Face. (4.) We propose different ways for the ML research community to deal with these legal implications. Our paper serves as a starting point for improving the alignment between data protection law and the technical capabilities of LLMs. Our findings underscore the need for more interaction between the legal domain and the ML community.
- North America > United States (0.47)
- Europe > France (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (4 more...)
The explanation dialogues: an expert focus study to understand requirements towards explanations within the GDPR
State, Laura, Colmenarejo, Alejandra Bringas, Beretta, Andrea, Ruggieri, Salvatore, Turini, Franco, Law, Stephanie
Explainable AI (XAI) provides methods to understand non-interpretable machine learning models. However, we have little knowledge about what legal experts expect from these explanations, including their legal compliance with, and value against European Union legislation. To close this gap, we present the Explanation Dialogues, an expert focus study to uncover the expectations, reasoning, and understanding of legal experts and practitioners towards XAI, with a specific focus on the European General Data Protection Regulation. The study consists of an online questionnaire and follow-up interviews, and is centered around a use-case in the credit domain. We extract both a set of hierarchical and interconnected codes using grounded theory, and present the standpoints of the participating experts towards XAI. We find that the presented explanations are hard to understand and lack information, and discuss issues that can arise from the different interests of the data controller and subject. Finally, we present a set of recommendations for developers of XAI methods, and indications of legal areas of discussion. Among others, recommendations address the presentation, choice, and content of an explanation, technical risks as well as the end-user, while we provide legal pointers to the contestability of explanations, transparency thresholds, intellectual property rights as well as the relationship between involved parties.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > Europe Government (0.48)
A Personal data Value at Risk Approach
What if the main data protection vulnerability is risk management? Data Protection merges three disciplines: data protection law, information security, and risk management. Nonetheless, very little research has been made on the field of data protection risk management, where subjectivity and superficiality are the dominant state of the art. Since the GDPR tells you what to do, but not how to do it, the solution for approaching GDPR compliance is still a gray zone, where the trend is using the rule of thumb. Considering that the most important goal of risk management is to reduce uncertainty in order to take informed decisions, risk management for the protection of the rights and freedoms of the data subjects cannot be disconnected from the impact materialization that data controllers and processors need to assess. This paper proposes a quantitative approach to data protection risk-based compliance from a data controllers perspective, with the aim of proposing a mindset change, where data protection impact assessments can be improved by using data protection analytics, quantitative risk analysis, and calibrating expert opinions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > France > Hauts-de-France > Nord > Lille (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (7 more...)
A Hate Speech Moderated Chat Application: Use Case for GDPR and DSA Compliance
Fillies, Jan, Mitsikas, Theodoros, Schäfermeier, Ralph, Paschke, Adrian
The detection of hate speech or toxic content online is a complex and sensitive issue. While the identification itself is highly dependent on the context of the situation, sensitive personal attributes such as age, language, and nationality are rarely available due to privacy concerns. Additionally, platforms struggle with a wide range of local jurisdictions regarding online hate speech and the evaluation of content based on their internal ethical norms. This research presents a novel approach that demonstrates a GDPR-compliant application capable of implementing legal and ethical reasoning into the content moderation process. The application increases the explainability of moderation decisions by utilizing user information. Two use cases fundamental to online communication are presented and implemented using technologies such as GPT-3.5, Solid Pods, and the rule language Prova. The first use case demonstrates the scenario of a platform aiming to protect adolescents from potentially harmful content by limiting the ability to post certain content when minors are present. The second use case aims to identify and counter problematic statements online by providing counter hate speech. The counter hate speech is generated using personal attributes to appeal to the user. This research lays the groundwork for future DSA compliance of online platforms. The work proposes a novel approach to reason within different legal and ethical definitions of hate speech and plan the fitting counter hate speech. Overall, the platform provides a fitted protection to users and a more explainable and individualized response. The hate speech detection service, the chat platform, and the reasoning in Prova are discussed, and the potential benefits for content moderation and algorithmic hate speech detection are outlined. A selection of important aspects for DSA compliance is outlined.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Greece (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- (7 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR
Zhang, Lu, Moukafih, Nabil, Alamri, Hamad, Epiphaniou, Gregory, Maple, Carsten
Since its implementation in May 2018, the General Data Protection Regulation (GDPR) has prompted businesses to revisit and revise their data handling practices to ensure compliance. The privacy policy, which serves as the primary means of informing users about their privacy rights and the data practices of companies, has been significantly updated by numerous businesses post-GDPR implementation. However, many privacy policies remain packed with technical jargon, lengthy explanations, and vague descriptions of data practices and user rights. This makes it a challenging task for users and regulatory authorities to manually verify the GDPR compliance of these privacy policies. In this study, we aim to address the challenge of compliance analysis between GDPR (Article 13) and privacy policies for 5G networks. We manually collected privacy policies from almost 70 different 5G MNOs, and we utilized an automated BERT-based model for classification. We show that an encouraging 51$\%$ of companies demonstrate a strong adherence to GDPR. In addition, we present the first study that provides current empirical evidence on the readability of privacy policies for 5G network. we adopted readability analysis toolset that incorporates various established readability metrics. The findings empirically show that the readability of the majority of current privacy policies remains a significant challenge. Hence, 5G providers need to invest considerable effort into revising these documents to enhance both their utility and the overall user experience.
- North America > United States > Pennsylvania > Dauphin County > Harrisburg (0.04)
- Europe > United Kingdom > England > West Midlands > Coventry (0.04)
- Asia > China > Yunnan Province > Kunming (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Education > Educational Setting > Higher Education (0.47)
ROI: A method for identifying organizations receiving personal data
Rodriguez, David, Del Alamo, Jose M., Cozar, Miguel, Garcia, Boni
The distributed nature of the Internet further facilitates sharing these data with organizations worldwide [1]. Identifying the organizations that receive these personal data is becoming increasingly crucial for different stakeholders. For example, supervisory authorities may leverage this information to conduct investigations on the relationship between the source and destination of some personal data flows to understand a system's compliance with, for instance, legal requirements for international transfers of personal data [2]. Also, privacy and legal researchers can use this information to discover what companies are collecting massive amounts of personal data [3]. Additionally, app and web developers may want to check what organizations they send their users' personal data to, sometimes even without their knowledge [4], to meet transparency requirements set, e.g., by privacy regulations. Even app marketplaces can take advantage of it in their app review processes (e.g.
- Europe > Spain > Galicia > Madrid (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom (0.04)
- Asia > China (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Web (1.00)
- Information Technology > Communications > Networks (1.00)
- (4 more...)
A Fine-grained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification
Zhao, Kaifa, Yu, Le, Zhou, Shiyao, Li, Jing, Luo, Xiapu, Chiu, Yat Fei Aemon, Liu, Yutong
Privacy protection raises great attention on both legal levels and user awareness. To protect user privacy, countries enact laws and regulations requiring software privacy policies to regulate their behavior. However, privacy policies are written in natural languages with many legal terms and software jargon that prevent users from understanding and even reading them. It is desirable to use NLP techniques to analyze privacy policies for helping users understand them. Furthermore, existing datasets ignore law requirements and are limited to English. In this paper, we construct the first Chinese privacy policy dataset, namely CA4P-483, to facilitate the sequence labeling tasks and regulation compliance identification between privacy policies and software. Our dataset includes 483 Chinese Android application privacy policies, over 11K sentences, and 52K fine-grained annotations. We evaluate families of robust and representative baseline models on our dataset. Based on baseline performance, we provide findings and potential research directions on our dataset. Finally, we investigate the potential applications of CA4P-483 combing regulation requirements and program analysis.
- Asia > China > Hong Kong (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > California (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
Automated Detection of GDPR Disclosure Requirements in Privacy Policies using Deep Active Learning
Rahat, Tamjid Al, Le, Tu, Tian, Yuan
Since GDPR came into force in May 2018, companies have worked on their data practices to comply with this privacy law. In particular, since the privacy policy is the essential communication channel for users to understand and control their privacy, many companies updated their privacy policies after GDPR was enforced. However, most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights. Therefore, it is unclear if they comply with GDPR. In this paper, we create a privacy policy dataset of 1,080 websites labeled with the 18 GDPR requirements and develop a Convolutional Neural Network (CNN) based model which can classify the privacy policies with an accuracy of 89.2%. We apply our model to perform a measurement on the compliance in the privacy policies. Our results show that even after GDPR went into effect, 97% of websites still fail to comply with at least one requirement of GDPR.
- North America > United States > California > San Francisco County > San Francisco (0.28)
- Europe > United Kingdom (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- (9 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
UK Uber drivers are taking the algorithm to court – TechCrunch
A group of U.K. Uber drivers has launched a legal challenge against the company's subsidiary in the Netherlands. The complaints relate to access to personal data and algorithmic accountability. Uber drivers and Uber Eats couriers are being invited to join the challenge, which targets Uber's use of profiling and data-fueled algorithms to manage gig workers in Europe. Platform workers involved in the case are also seeking to exercise a broader suite of data access rights baked into EU data protection law. It looks like a fascinating test of how far existing legal protections wrap around automated decisions at a time when regional lawmakers are busy drawing up a risk-based framework for regulating applications of artificial intelligence. Many uses of AI technology look set to remain subject only to protections baked into the existing General Data Protection Regulation (GDPR).
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Transportation > Ground > Road (0.96)
Learning Smooth and Fair Representations
Gitiaux, Xavier, Rangwala, Huzefa
Organizations that own data face increasing legal liability for its discriminatory use against protected demographic groups, extending to contractual transactions involving third parties access and use of the data. This is problematic, since the original data owner cannot ex-ante anticipate all its future uses by downstream users. This paper explores the upstream ability to preemptively remove the correlations between features and sensitive attributes by mapping features to a fair representation space. Our main result shows that the fairness measured by the demographic parity of the representation distribution can be certified from a finite sample if and only if the chi-squared mutual information between features and representations is finite. Empirically, we find that smoothing the representation distribution provides generalization guarantees of fairness certificates, which improves upon existing fair representation learning approaches. Moreover, we do not observe that smoothing the representation distribution degrades the accuracy of downstream tasks compared to state-of-the-art methods in fair representation learning.
- North America > United States > Virginia > Fairfax County > Fairfax (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (0.67)